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The next generation of telescopes and instruments are facilitating our understand- 
ing of the Universe by producing data at a pace that beats all projections, and 
astronomers today are left in the face of an avalanche of data hke never before. 
In order to cope with this problem and come up with a reliable and iimovative so- 
lution, Data Centers were created in various locations and the concept of Virtual 
Observatories elaborated. Based at the National Taiwan Normal University, the 
Taiwan Extragalactic Astronomical Data Center plan to join in global efforts by 
proposing IPb of data storage dedicated to extragalactic astronomy by 2015. In 
continuation with individual efforts in Taiwan over the past few years, this is the 
first stepping-stone towards the building of a National Virtual Observatory. 

Besides the common functionalities generally provided by data centers, our 
goal is to propose "on-the-fly" photometry measurements from publicly available 
surveys: a unique way for cross-matching information. Also we will propose ac- 
cess to raw and reducible data available from archives worldwide, a goldmine of 
under-exploited information. Finally, we will propose our own specific analysis 
tools available on-line through a user-friendly interface. 

Purchased very recently, the current Data Storage Unit is capable of accumu- 
lating up to 50Tb of data. In the first phase, we will focus on multiband catalog 
cross-matching and make the latest extragalactic datasets available to the world- 
wide community, which should be fully functional in 201 1 . 

INTRODUCTION 

Planning of the next generation of telescopes and instruments are becoming very ambitious re- 
quiring massive aggregation of resources and expertise. Project such as ALMA^^ and the Thirty- 
meter Telescope-'-, will allow us to push forward our exploration to the edge of the Universe and 
help us survey the whole sky at a pace never imagined before. Furthermore, to gather the maximum 
possible information, these projects cover the sky in different wavebands, from gamma- and X-rays, 
optical, infrared, through to radio. Such observations require a wide range of expertise and informa- 
tion, which is sometimes spUt and difficult to bridge. These breakthroughs in telescopes, detectors, 
and also computer technology allow astronomical instruments to produce several terabytes of im- 
ages and catalogs. Astronomy, today, faces a data avalanche. It is already almost easier to dial-up a 
part of the sky than wait many months to have access to a telescope. With the advent of inexpensive 
storage technologies and the availabiUty of high-speed networks, the concept of multi-terabyte on- 
Une databases interoperating seamlessly is no longer outlandish. More and more catalogs are now 
interUnked, crossing wavelengths boundaries. Furthermore the new generation of survey telescopes 
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(Pan-STARRS, LSST, etc) will image the entire sky every few days and yield Petabytes of data. 

Over the past decade the concept of the Virtual Observatory (VO) has emerged rapidly to address 
challenges relating to data management, analysis, distribution and interoperability. The VO is a sys- 
tem in which the vast astronomical archives and databases around the world, together with analysis 
tools and computational services, are linked together into an integrated facility. By providing the 
tools to assemble and explore massive data sets quickly, the VO facihtates and enables a broad range 
of sciences. Amalgamating massive data sets over a broad range of wavelengths, spatial scales, and 
temporal intervals is especially fruitful. VO-based studies include systematic explorations of the 
large-scale structure of the Universe, the structure of our Galaxy, AGN populations in the universe, 
variabiUty on a range of time scales, wavelengths, and flux levels. The VO also enables searches for 
rare, unusual, or even completely new types of astrophysical objects and phenomena. For the first 
time, we are able to compare the results of massive numerical simulations with equally voluminous 
datasets. The International Virtual Observatory Alliance* (IVOA) was formed in June 2002 with a 
mission to "facilitate the international coordination and collaboration necessary for the development 
and deployment of the tools, systems and organizational structures necessary to enable the interna- 
tional utiUzation of astronomical archives as an integrated and interoperating virtual observatory." 
The IVOA now comprises 15 national and three regional/agency VO projects. In East Asia, Japan 
and China are members of IVOA, who are also developing their own national VOs. 



THE TAIWAN EXTRAGALACTIC ASTRONOMICAL DATA CENTER 
PhUosophy 

Data centers contribute to global efforts in different ways: data archives, with a particular em- 
phasis put on 'science ready' data; added-value databases, services; tools, software suites and al- 
gorithms, for instance for data visualization, data analysis and data mining; thematic services to 
help solve a well-defined science question; fuU data analysis or research environments. New types 
of services are emerging, in particular, theoretical services providing modeling results or matching 
models with observations. The main role of Data Centers is not only to provide a good quality 
service to the community, but also provide added value based on expertise. This requires shared 
efforts not only in developing software and database environments, but also in crossing information 
between observational projects of diverse nature and of different wavebands. 

The Taiwanese astronomical community needs to step into the VO era. Even though the results of 
the efforts made by the VO community worldwide are meant to be public, Taiwan must participate 
in it to prepare our next generation of astronomers who will require such skills and also not to be 
relegated to the followers position. To help us reaUze this vision for the future, NTNU has funded in 
2010 the creation of the first Taiwan based Data Center dedicated to extragalactic astronomy. Sev- 
eral individual efforts have been conducted over the past few years to develop VO in Taiwan and the 
Taiwan Extragalactic Astronomical Data Center (TWEA-DC) is the obvious next leap forward. By 
having a fully functioning data center around which the community can work, Taiwan will be ready 
to join the international VO community. The efforts conducted by the VO community are already in 
very advanced stages, and therefore we will work on the base of their latest developments, and will 
include the available apphcations developed by the international community over the past decades. 
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Mission and Goals 



One of the major goals of the TWEA-DC is to form the next generation of astronomers, who will 
have to keep up pace with the changing face of modern Astronomy. Moving into the VO era will 
have a dramatic impact on the existing skill base of young astronomers. Therefore, by making a 
move now in this direction, Taiwan will prepare the next generation of scientists to face the tech- 
nological revolution. Astronomy is now based on of large datasets, covering a broad wavelength 
range. The challenge is to aggregate the information and generate a final product that will bridge 
different expertise and, therefore, generate an enhanced scientific output. Therefore large amounts 
of data storage are required locally to enable a fast access to images and catalogs. In order to ful- 
fill this goal using gigantic amount of data, new tools for data analysis have to be developed. The 
TWEA-DC will help to fulfill this mission by focusing on three main goals: 

• Matching different data-sets: several million of objects cross-matched in very short timescale, 
requiring new algorithms and new concepts. Direct matching of images will be actually ideal, and 
performing "on-the-fly" photometry and extraction is the way of the future algorithm. However 
as a first approach we plan to cross-match catalogs. Thanks to our algorithm, we are proposing a 
"on-the-fly" matching, enabling a new way of dealing with datasets. 

• User-friendly portal to archived raw and reducible data: standard data reduction is not always 
optimum, and dedicated processes are sometimes required. A centralized access to raw data will 
ease their exploitation. 

• Specialized and dedicated analysis tools: We are planning to develop and distribute new analy- 
sis tools through a user-friendly interface. Some of our tools are already ready for implementation, 
such as an online very fast correlation function measurements tool. 

CATALOGS, IMAGES AND "ON-THE-FLY" MATCHING TOOL 

As a initial service, the TWEA-DC provides a tool to cross-match multiband datasets to the com- 
munity. Astronomers have to investigate innovative solutions to deal with the gigantic number of 
objects provided by new datasets. Services as simple as a data-base query or matching on the sky 
can be a real problem. One of the most recent solution is the use of hierarchical subdivision of the 
celestial sphere using spherical triangles. This kind of algorithms, based on quadtree algorithm, 
are nowadays widely used to query in astronomical database; for instance refer to the Hierarchical 
Triangular Mesh (HTM) algorithm^ (see Fig. 1). HTM is now a standard spatial indexing for as- 
tronomy and is used in various surveys (DES, LSST, etc). Based on this technology we developed a 
very fast code which allows us an "on-the-fly matching". This strategy enable the users to tune their 
own match, and to upload private catalogs and match them against public catalogs in the database. 

To really increase drastically the velocity of matching, new algorithm should not be based solely 
on the position on the sky, but make the best of the available wide range of parameters (fluxes, 
shapes, compactness, etc.). Our ultimate goal would be to directly use the images in order to per- 
form the matching process. However the challenges to face (different nature of data, PSF, etc) imply 
a long-term development in close collaboration with computer scientists. 

A PORTAL TO RAW AND REDUCIBLE DATA 

Usually data centers are focused in providing for fully reduced and calibrated data sets, processed 
through conventional data reduction pipelines. However for some specific studies, astronomers want 
to process the data using alternative data reduction algorithm. It is for instance the case to extract 
very low-surface brightness features from images (low-surface brightness galaxies, or tidal residuals 
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Figure 1. Hierarchical Triangular subdivision. The sky is divided in a succession 
of spherical triangles, which can be distributed in the form of a tree. The smallest 
elements will contain only one object. Such data structure enables very fast queries 
and cross-matches. 

around local galaxies, etc.). This requires a fully independent data reduction process, especially in 
the near-infrared for which the sky subtraction algorithm usually remove these features. Some 
data/archive centers propose access to raw and calibration data, but for a limited number of data- 
sets/telescopes. We propose to create a portal that will give access in one single interface to an 
exhaustive list of raw and reducible data produced by different facilities/telescopes. 

SOME HIGH-LEVEL "ON-THE-FLY" ANALYSIS TOOLS 

One important mission of a Data Center is to provide the community for user-friendly tools 
to perform high-level analysis remotely. For instance the Centre de Donnees astronomiques de 
Strasbourg - CDS^ - had developed a large number of web-based applications to access and visualize 
astronomical data-sets. Some basic tools such as plotting and database queries have to be part of the 
package proposed by the TWEA-DC, as well as access to some of the wide amount of tools already 
developed by the VO community (for instance TOPCAT*). However our primary goal is to develop 
our own set of dedicated analysis tools. Beyond the matching procedure, we want to provide the 
community with a service that will take fully advantage of our server. Indeed, the amount of data 
available impose the users to conduct their analyses remotely. 

EAG: galaxy 2-point angular correlation in a blink 

The first application available will measure the angular 2-pt correlation function^ for very large 
samples in a very small timescale. The correlation function quantifies the clustering of galaxies, 
and provides for vital information on the evolution of large-scale structure, on galaxy formation 
and also probes cosmological parameters. However performing such a study on very large samples 
may require long computing time. Given the size of the next generation of sky surveys, improved 
algorithms have to be developed. In practice, the 2-point angular correlation is simply measuring 
the excess of pairs in the data sample compared to a random distribution at different scales. EAG is 
based on a similar algorithm than our matching code, spreading the position in a quadtree structure, 
and double-walks are performed to count the pairs."^ We have developed a very fast code, coupled 
with a web-interface, that allows to determine the 2-point correlation function of several million of 
objects in a few minutes. EAG will be available to the community by the end of 2011. 

*http : / /www .star.bris.ac. uk/~inbt/topcat/ 
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The next generation of tools 

We already are working on the next generation of tools that will be proposed as a service. 

To have a complete view of our Universe, one need to be able to measure how distant galaxies 
are, which can be achieved by measuring their redshift. However spectroscopic datasets are difficult 
to get for a complete sample of the sky and we have to rely on photometric methods to determine 
the redshift (SED minimization fitting, neuronal networks, etc). We are adapting some existing 
codes to entitle a photometric redshift determination "on-the-fly" remotely from our data-base. 

Galaxies are living in more-or-less dense structures that have a huge impact on their evolution. 
However this structures are not always straightforward to extract and require well thought algo- 
rithms. We are developing a Group and cluster finders that will work "on-the-fly" remotely. 

Also we are collaborating closely with computer scientists that will apply algorithms developed 
for Data Mining^ to help the astronomical community to make the best out of the Data Center. 

RAISING INTERESTS AND SKILLS FOR VIRTUAL OBSERVATORIES IN TAIWAN 

The most important role of the TWEA-DC for the next couple of years will be to prepare and 
train the current and next generation of astronomers to the future kind of astronomy. The scale 
of datasets and the complexity of the data themselves will require a new type of astronomers that 
will be familiar with computer science and able to collaborate fully with specialists on data mining. 
This problem is known for decades (sometimes called the Fourth paradigm^ - the first three being 
observation, theory and simulations) and astronomers all around the globe have developed these 
skills and competences already (as testified by the International Virtual Observatory Alliance). 

The astronomical community in Taiwan is aware of the urge to join the global effort. However the 
rather small size of the community prevented so far individual efforts to be successful on the long 
term. The creation of a local Data Center will be a focus point for such efforts to be maintained. In 
parallel of software development, we will organize training activities dedicated to the astronomical 
community. We will organize workshops in collaboration with the Department of Computer Science 
and Engineering, for which we are also planning to invite foreign speciaUsts on VO and Data mining 
in Astronomy. In a longer term we would like to host a IVOA Interoperability Meeting or an 
Astronomical Data Analysis Software and Systems (ADASS) conference in Taiwan. The current 
generation of students has to be prepare to tackle the new generation of datasets. As part of then- 
training program we will propose courses in collaboration with our colleagues computer scientist, 
and we will organize Student Summer Schools in Taiwan, inviting international specialists. 

THE FIRST YEAR 

The current version of TWEA-DC gathers 48Tb of data storage. The Data center has been ex- 
clusively funded by the National Taiwan Normal University. The structure of the TWEA-DC is 
tailored for rapid data access and is composed of a server, a data storage unit and a mirror backup 
System (see Fig. 2). The communication speed between the server and the data units is of 4Gb/s, 
while the backup system operates at a rate of IGb/s. The Data Center is set up by a team of our 
graduate students from NTNU. They are responsible for implementing hardware, software, security, 
log, and backup system. They will also setup the database and create the web interface. The team 
works under our supervision with consultation of Computer scientists. We expect the Data Center to 
be ready for July 201 1. The on-the-fly cross-matching between the major public multiband catalogs 
(SDSS, UKIDSS, CFHTLS, etc) and private catalogs matching will be the first service available. 
We expect to release our database and the associated tools for the community by the end of 201 1 . 
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Figure 2. Current structure of the TWEA-DC. 



CONCLUSION 

The Taiwan Extragalactic Astronomical Data Center will be available to the worldwide commu- 
nity by the end of 2011. This effort, led by the National Taiwan Normal University, consists of the 
first stepping stone in building a National Virtual Observatory in Taiwan. Its main goals will be to 
make available the most important publicly available datasets along with a very fast catalog match- 
ing tool, allowing "on-the -fly" matching. The TWEA-DC will also provide a user-friendly portal 
to access easily reducible data from worldwide archives, and provide dedicated high end analysis 
tools. Finally the TWEA-DC will help to train the Taiwanese community, and will act as a bridge 
between the local astronomers, local computer scientists but also the global VO community. 
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